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Abstract 

This  work  presents  a  cyclic  dependency  analysis  for  stream- 
based  programs.  Specifically,  we  focus  on  the  cyclo-static  dataflow 
(CSDF)  programming  model  with  control  messages  through  tele¬ 
port  messaging  as  implemented  in  the  Streamit  framework.  Unlike 
existing  cyclic  dependency  analyses,  we  allow  overlapped  teleport 
messages.  An  overlapped  teleport  message  is  one  that  traverses  ac¬ 
tors  that  themselves  transmit  teleport  messages,  which  can  compli¬ 
cate  the  stream  graph  topology  with  teleport  messages.  Therefore, 
the  challenge  in  this  work  is  to  decide  whether  such  stream  graphs 
are  feasible  in  the  presence  of  such  complex  teleport  messages.  Our 
analysis  addresses  this  challenge  by  first  ensuring  that  the  stream 
graph  with  teleport  messages  is  feasible,  and  then  computing  an 
execution  schedule  for  the  CSDF  graph  in  the  presence  of  complex 
overlapped  teleport  messaging  constraints.  Consequently,  our  anal¬ 
ysis  accepts  a  larger  class  of  CSDF  stream  graphs  with  complex 
teleport  messaging  topologies  for  execution. 

General  Terms  Languages,  Semantics,  Design 

Keywords  Streaming,  Dependency  analysis.  Scheduling,  Dead¬ 
lock  detection. 

1.  Introduction 

Streaming  applications  are  an  important  class  of  applications  com¬ 
mon  in  today’s  electronic  systems.  Examples  of  streaming  appli¬ 
cations  constitute  image,  video  and  voice  processing.  To  facilitate 
precise  and  natural  expression  of  streaming  applications,  research 
proposes  several  streaming  languages  such  as  E  El  in  [B]  El] 
that  allow  programmers  to  faithfully  model  streaming  applications. 
These  languages  employ  high-level  domain  abstractions  instead 
of  low-level  languages  such  as  C  to  enable  portability  and  auto¬ 
matic  optimizations  for  a  variety  of  target  architectures  including 
novel  multicore  platforms.  For  example,  static  dataflow  (SDF)  [13, 
cyclo-static  dataflow  (CSDF)  E,  multidimensional  synchronous 
dataflow  (MDSDF)  E3,  and  Kahn  process  network(KPN)  fid]  are 
models  of  computation  well-suited  for  modeling  and  synthesis  of 
streaming  applications. 

In  those  above  models  of  computations  for  streaming  applica¬ 
tions,  SDF  and  its  derivatives,  such  as  CSDF  and  MDSDF,  are  more 
static  than  KPN,  however,  compilers  could  optimize  applications 
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in  those  domains  for  buffer  space,  scheduling,  balanced  mapping 
on  multicore  more  easily  than  for  general  KPN  applications.  How¬ 
ever,  the  static  nature  of  SDF  and  its  derivatives  makes  it  difficult  to 
implement  more  dynamic  streaming  applications  such  as  changing 
rates  of  tokens  that  a  task  can  produced  or  parameters  used  to  com¬ 
pute  with  data.  Several  efforts  have  tried  to  address  the  issue  such 
as  parameterized  dataflow  (2)  used  in  modeling  image  processing 
systems  1281. 

However,  as  streaming  algorithms  become  more  and  more  dy¬ 
namic  and  complicated  to  enable  higher  quality  of  service  while 
keeping  underlying  hardware  systems  at  a  reasonable  price  and 
power-efficient,  streaming  languages’  original  abstractions  may  no 
longer  be  able  to  capture  all  new  complexities  in  a  natural  way.  To 
solve  the  above  problem,  language  designers  should  come  up  with 
new  language  extensions  to  express  new  complexities  more  conve¬ 
nient.  Teleport  messaging  (TMG)  in  the  Streamit  language  1301  is 
an  example. 

1.1  Streamit  Language  and  Compiler 

Streamit  HD  is  a  language  for  streaming  applications  based  on 
the  CSDF  d  programming  model,  a  generalization  of  the  Syn¬ 
chronous  Dataflow  (SDF)  d  model  of  computation.  The  lan¬ 
guage  exploits  inherent  task-level  parallelism  and  predictable  com¬ 
munication  properties  of  CSDF  to  partition  and  mapping  a  stream 
program  onto  different  multicore  architectures  (HIIII.  A  Streamit 
application  is  a  directed  graph  of  autonomous  actors  connected  via 
FIFO  buffers  of  predictable  sizes  due  to  static  production  and  con¬ 
sumption  rates  of  actors.  This  static  feature  of  CSDF  enables  com¬ 
pilers  to  optimize,  and  transform  programs  to  efficiently  deploy 
them  onto  different  architectures.  The  CSDF  model  of  computa¬ 
tion  is  suitable  for  expressing  regular  and  repetitive  computations. 
However,  CSDF  makes  it  difficult  to  express  dynamic  streaming  al¬ 
gorithms  because  of  its  requirement  to  enforce  periodic  and  static 
schedules.  As  a  result,  dynamic  streaming  algorithms  require  sub¬ 
stantial  modifications  to  the  streaming  program  structures  itself, 
which  makes  using  CSDF  a  complex  task  for  such  applications. 

1.2  Control  Messages 

In  contrast  to  high-frequency  regular  data  messages  such  as  those 
typically  modeled  in  CSDF,  infrequent  control  messages  sent  be¬ 
tween  actors.  Control  messages  are  necessary  to  enable  implement¬ 
ing  more  dynamic  streaming  application  algorithms,  e.g.  they  could 
be  used  to  adjust  the  employed  protocol,  and  the  compression  rate. 
Manually  integrating  infrequent  low  rate  control  messages  into  fre¬ 
quent  high-rate  streams  would  complicate  the  overall  structure  of 
an  application,  which  then  makes  it  difficult  for  users  to  maintain 
and  debug  such  an  application,  and  also  potentially  reduce  its  ef¬ 
ficiency.  It  is  useful  to  separate  control  from  data  computation,  as 
in  El,  so  that  compiler  optimization  methods  can  be  applied  to  gen¬ 
erate  more  efficient  code. 
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(a)  FIR  exam¬ 
ple 


(b)  New  weights  are  attached  with  data  tokens  and 
processed  at  actors  before  processing  data 


(c)  New  weights  are  sent  to  actors  via  teleport  mes¬ 
sages  before  the  arrival  of  data. 


Figure  1.  Adding  dynamicities  to  an  FIR  computation 


Thies  et  al.  in  I30l  give  a  TMG  model  for  distributed  stream  pro¬ 
grams.  TMG  is  a  mechanism  that  implements  control  messages  for 
stream  graphs.  The  TMG  mechanism  is  designed  not  to  interfere 
with  original  dataflow  graphs’  structures  and  scheduling,  therefore 
a  key  advantage  of  TMG  is  that  it  incorporates  control  messages 
without  requiring  the  designer  to  restructure  the  stream  graph,  re¬ 
compute  production  and  consumption  rates,  and  further  complicate 
the  program  code  for  the  actors.  However,  it  still  needs  to  be  precise 
relatively  to  data  messages.  For  instance,  a  set  of  new  parameters 
specified  in  a  control  message  should  only  take  effect  on  some  des¬ 
ignated  data  messages.  This  requires  synchronization  methods  be¬ 
tween  data  messages  and  control  messages.  Moreover,  the  structure 
of  stream  graphs  with  TMG  exposes  dependencies  that  allow  auto¬ 
mated  analytical  techniques  to  reason  about  timing  of  control  mes¬ 
sages.  Naturally,  stream  graph  compilers  can  implement  these  anal¬ 
ysis  techniques.  Users  can  also  change  latency  of  messages  without 
changing  the  structures  of  stream  graphs. 

Let  us  illustrate  the  problem  with  an  example.  Consider  the  Fi¬ 
nite  Impulse  Response  (FIR)  example  from  (30|,  shown  in  Fig¬ 
ure  [T(^  FIR  is  a  common  kind  of  filter  in  digital  signal  process¬ 
ing,  wEere  the  output  of  the  filter  is  a  convolution  of  the  input  se¬ 
quence  with  a  finite  sequence  of  coefficients.  The  FIR  example  is 
composed  of  a  Source,  a  Printer,  and  64  Multiply  and  Add  ac¬ 
tors.  Each  Multiply  actor  has  a  single  coefficient  w,  called  a  tap 
weight,  of  a  FIR  filter. 

Now  suppose  that  during  the  execution  at  some  iteration,  the 
Source  actor  detects  some  condition,  and  it  decides  to  change  the 
tap  weights  of  the  Multiply  actors.  The  new  set  of  weights  should 
only  he  used  to  compute  with  data  produced  by  the  Source  actor 
during  and  after  its  current  execution. 

One  way  to  ensure  that  the  new  coefficients  are  only  used  with 
the  appropriate  data  is  to  attach  the  new  set  of  tap  weights  with  data 
packets  sent  from  the  source,  as  in  Figure  [T(b^  In  the  figure,  each 
time  a  data  token  attached  to  a  message  with  new  weights  arrives, 
each  Multiply  actor  will  detach  the  message  and  update  its  weight 
and  use  the  new  weight  to  compute  with  the  just  arrived  data  token. 
However,  this  approach  is  not  efficient;  it  changes  the  structure  and 
data  packets,  and  would  require  an  aggressive  compiler  analysis 
to  optimize  a  program  as  well  as  minimize  communication  buffer 
sizes  between  actors  in  a  stream  graph. 

In  contrast,  with  the  teleport  approach,  illustrated  in  FigurepX^ 
the  Source  actor  could  send  teleport  messages  (TM)  containing 


new  weights  to  each  Multiply  actor  before  data  tokens  that  need 
to  be  computed  with  new  weights  arrive.  This  approach  provides  a 
clean  separation  between  control  execution  and  data  computation. 
This  separation  makes  it  easier  to  maintain  and  debug  programs, 
it  also  helps  avoid  error-prone  task  of  manual  embedding  and  pro¬ 
cessing  control  information  within  data  tokens. 

The  TMG  mechanism  requires  synchronization  between  data 
tokens  and  control  messages,  so  that  a  control  message  is  only 
handled  just  before  the  computation  of  the  appropriate  data  token. 
The  theory  of  Streamit  TMG  synchronization  method  provides  an 
SDEP  function  to  accomplish  this  synchronization.  However,  the 
SDEP  function  alone  is  not  powerful  enough  to  enable  the  Streamit 
compiler  to  handle  the  case  where  several  TMs  are  overlapped  in 
a  stream  graph.  This  limitation  hinders  the  deployment  of  more 
complicated  stream  programs  in  Streamit. 


1.3  Circular  Dependencies 

Figure  [T(c^  uses  a  simplified  notion  of  TMs,  where  the  latency  of 
the  message  is  implicitly  set  to  0.  Latency  constraints  are  needed  in 
order  to  specify  which  instance  of  an  actor  can  receive  TMs  from 
which  instance  of  another  actor.  A  integer  parameter  called  latency 
annotating  a  TM  is  used  to  achieve  this.  In  Figure[T(c)l  all  latencies 
are  implicitly  zero.  This  means  that  the  execution  of  Source 
may  send  a  TM  to  the  (but  no  earlier  nor  later)  execution  of 
each  of  the  Multiply  actors. 

In  general,  latencies  can  be  different  from  zero,  as  shown  in 
Figure]^  It  is  then  possible  that  there  does  not  exist  a  schedule  that 
delivers  TMs  with  the  desired  latencies.  The  left  side  of  Figure]^ 
shows  an  example  where  the  set  of  latency  constraints  is  not  satis- 
fiable.  Let  us  explain  the  example.  Let  us  denote  by  Am  the 
execution  of  actor  A.  Suppose  that  actor  A  is  currently  at  its 
execution.  Then,  we  have  the  following  constraints: 

•  The  latency  constraint  imposed  by  TMs  sent  from  actor  D 

to  actor  A  requires  that  An+i  wait  for  possible  TMs  from 
Dn  before  it  can  execute.  In  other  words,  A„+i  depends  on 
Dn  and  has  to  execute  after  Let’s  denote  this  as 

Dn  A  An-\-l‘ 

•  As  Bn+\  consumes  one  token  produced  by  An+i,  we  have 
An  +  l  A  Bn  +  1- 


2 


2011/8/29 


•  We  assume  that  multiple  executions  of  a  single  actor  must 
proceed  in  sequential  order,  so  Bn+i  -<  B„+2,  and  hence,  of 
course,  Bn+i  -<  Bn+io 

•  Again,  the  latency  imposed  hy  teleport  messages  sent  from 
actor  B  to  actor  C  constrains  C„  to  wait  for  a  possible  TM 
from  Bn+io,  therefore,  B„+io  A  Cn. 

•  Finally,  consumes  one  token  produced  hy  C„,  therefore 

Cn  A  -Dn- 

Summing  up,  we  have  the  set  of  dependency  constraints  for 
the  example:  Dn  -<  An+i  A  Bn+i  A  -Bn+io  A  Cn  A  Dn- 
We  can  see  that  these  dependency  constraints  create  a  cycle,  so  no 
evaluation  order  exists  and  the  system  is  deadlocked. 


0 


Overlapping  of  Non-overlapping  of 

teleport  messages  teleport  messages 


Figure  2.  Overlapping  and  non-overlapping  scenarios  of  teleport 
messages. 

There  are  two  factors  creating  cyclic  dependencies  of  actor 
executions  for  a  stream  graph:  i)  the  structure  of  the  stream  graph, 
and  ii)  the  latencies  of  teleport  messages.  Let  us  take  an  example  to 
illustrate  the  importance  of  the  two  factors. 

Now,  we  keep  the  same  graph  structure  in  the  left  side  of 
Figure  however,  we  suppose  that  the  latency  for  TMs  between 
actor  B  an  actor  C  is  0.  In  this  case,  there  exists  a  valid  evaluation 
order:  An  <  Bn  <  Cn  <  Dn  <  An+l  A  Bn+l  A  . .  .. 

We  can  see  that  for  the  same  stream  graph  structure,  different  TM 
latencies  could  result  in  completely  different  situations;  the  first  one 
is  a  deadlocked  while  the  second  one  has  a  valid  schedule. 

Thies  et  al.  1301  call  the  graph  structure  on  the  left  of  Fig¬ 
ure  “overlapping  constraints,”  because  the  paths  between  actors 
involved  in  TMs  have  some  actors  in  common.  The  compiler  sim¬ 
ply  rejects  graph  structures  that  have  overlapping  TM  situations 
regardless  of  message  latencies  even  if  the  latencies  could  result  in 
valid  schedules,  as  is  case  if  the  latency  between  actor  B  and  actor 
C  is  0.  The  Streamit  compiler  only  allows  non-overlapping  TMs, 
as  on  the  right  side  of  Figure]^  This  conservative  approach  reflects 
an  underdeveloped  theory  of  execution  dependency  in  the  Streamit 
compiler. 

The  first  contribution  of  our  paper  is  a  method  for  checking 
circular  dependencies  of  distributed  stream  programs  in  the  pres¬ 
ence  of  overlapping  of  TMs  by  introducing  static  finite  dependency 
graphs  for  infinite  sequences  of  executions  then  proposing  an  al¬ 
gorithm  for  directly  constructing  such  graphs  from  stream  graphs. 
The  second  contribution  of  this  paper  is  to  show  how  to  find  an 
execution  order  for  a  stream  graph  when  there  is  no  circular  de¬ 
pendencies  by  solving  a  linear  program.  We  have  implemented  our 
checking  and  ordering  methods  as  a  backend  for  the  Streamit  com¬ 
piler. 


2.  Background 

2.1  Programming  Model 


Figure  3.  Cyclic  Static  Dataflow  model  of  computation 

The  CSDF/SDF  model  of  computation  captures  the  seman¬ 
tics  of  streaming  applications  and  allows  several  optimization  and 
transformation  techniques  for  buffer  sizes  (23),  partitioning  and 
mapping  of  stream  graphs  onto  multicore  architectures  (uniiia, 
and  computational  methods  I10II18I. 

In  the  CSDF  model  of  computation,  a  stream  program,  given  as 
a  graph,  is  a  set  of  actors  communicating  through  FIFO  channels. 
Each  actor  has  a  set  of  input  and  output  ports.  Each  channel  con¬ 
nects  an  output  port  of  an  actor  to  an  input  port  of  another  actor. 
Each  actor  cyclically  executes  through  a  set  of  execution  steps.  At 
each  step,  it  consumes  a  fixed  number  of  tokens  from  each  of  its 
input  channels  and  produces  a  fixed  number  of  tokens  to  each  of 
its  output  channels.  This  model  of  computation  is  interesting  as  it 
allows  static  scheduling  with  bounded  buffer  space. 

Figure  1^  shows  an  example  of  a  CSDF  stream  graph.  Actor  A 
has  two  output  ports.  Each  time  actor  A  execute,  it  produces  2 
tokens  on  its  lower  port  and  1  token  on  its  upper  port  Actor  B 
consumes  1  and  produces  2  tokens  each  time  it  executes.  Actor 
E  alternately  consumes  0  and  1  token  on  its  upper  port  and  1 
and  0  tokens  on  its  lower  port  each  time  it  executes.  The  theory 
of  CSDE  and  SDE  provides  algorithms  to  compute  the  number  of 
times  each  actor  has  to  execute  within  one  iteration  of  the  whole 
stream  graph,  so  that  the  total  number  of  tokens  produced  on  each 
channel  between  two  actors  is  equal  to  the  total  number  of  tokens 
consumed.  In  other  words,  the  number  of  tokens  on  each  channel 
between  actors  remains  unchanged  after  one  iteration  of  the  whole 
stream  graph.  For  example,  in  one  iteration  of  the  stream  graph 
in  Figure  actors  A,  B,  C,  D,  E  have  to  execute  3,  3,  2,  2,  5 
times  respectively.  A  possible  schedule  for  one  iteration  for  the 
whole  stream  graph  is  3(A),  3{B),  2{C),  2(D),  5(E).  This  basic 
schedule  can  be  iteratively  repeated.  As  we  can  see  with  this  basic 
schedule,  the  number  of  tokens  on  each  channel  remains  the  same 
after  one  iteration  of  the  whole  stream  graph.  For  instance,  in  the 
channel  between  B  and  D,  in  one  iteration,  B  produces  3  •  2  tokens 
while  D  consumes  2  •  3  tokens. 

2.2  Teleport  Messaging 

TMG  enables  executing  an  actor  B  asynchronously  with  an  actor 
A.  Ordering  constraints  implied  by  a  TM  tagged  with  a  latency, 
which  specifies  message  processing  delay,  must  be  enforced.  The 
latency  parameter  determines  the  execution  of  B  at  which  the 
TM  handler  in  B  is  invoked.  To  enable  this,  actor  B  declares  a 
message  handler  function.  Then,  the  contained  of  actor  A  and 
B  then  declares  portal,  a  special  type  of  variable,  of  the  type 
portal<B>,  as  in  Figure]^  Each  portal  variable  is  associated  with 
a  specific  type,  e.g.  portal<B>,  which  is  again  associated  with 
specific  actor  type,  e.g.  B.  A  portal  variable  of  type  portal<B> 
could  invoke  messages  handlers  declared  within  B.  This  portal 

*  Actors  within  a  containter  are  similar  to  objects  contained  within  another 
object. 
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variable  is  then  passed  to  the  entry  function  of  actor  A  to  enable 
actor  A  to  invoke  the  message  handler  functions  of  actor  B  with 
some  latency  parameters.  For  example,  for  actor  A  to  send  a  TM 
to  B  with  a  latency  of  k  means  that  on  the  iteration  of  firing 
actor  A,  the  TM  is  sent  alongside  with  the  data  token  to  actor  B. 
Once  the  data  token  reaches  actor  B,  actor  B  fires  the  message 
handler  before  consuming  the  data  token.  Other  actors,  such  as  C 
could  also  use  the  portal  to  send  B  a  message. 


Figure  4.  Example  of  teleport  portal. 


2.3  TMs  Timing  with  SDEP 

As  TMs  are  sent  between  actors,  it  is  mandatory  that  we  need  to 
have  a  way  to  specify  when  TMs  should  be  processed  by  receiving 
actors  as  current  (time  when  TMs  are  sent)  status  of  receiving 
actors  could  be  not  appropriate  to  process  TMs.  In  other  words,  we 
would  need  to  find  exact  executions  of  receiving  actors  that  TMs 
should  be  processed.  The  following  SDEP  function  provides  a  way 
to  find  processing  lime  of  TMs  for  receiving  actors. 

Thies  et  al.  1301  formally  present  an  approach  to  compute  the 
invocation  of  actors  based  on  their  dependencies.  We  borrow  their 
formulation,  and  use  it  in  our  dependency  analysis.  We  briefly 
describe  the  semantics  from  1301  for  the  reader. 

We  first  present  Definition  [T]  which  is  a  stream  dependency 
function  SDEP  that  represents  the  data  dependency  of  executions 
of  one  actor  on  the  execution  of  other  actors  in  the  stream  graph. 
SDEPA<-s(n)  represents  the  minimum  number  of  times  actor  A 
must  execute  to  allow  actor  B  to  execute  n  times.  This  is  based  on 
the  intuition  that  an  execution  of  a  downstream  actor  requires  data 
from  some  execution  of  an  upstream  actor;  thus,  the  execution  of 
the  downstream  actor  depends  on  some  execution  of  the  upstream 
actor.  In  other  words,  given  an  execution  of  a  downstream  actor 
B,  SDEP  function  could  return  the  latest  execution  of  a  upstream 
actor  A  that  the  data  it  produces,  going  though  and  being  processed 
by  intermediate  actors,  will  affect  the  input  data  consumed  by 
execution  of  actor  B. 

Definition  1.  (SDEP) 


SDEPA^B(n)  =  min  |</>AA| 

,\(}>/\B\=n 

where  $  is  the  set  of  all  legal  sequences  of  executions,  </)  is  a  legal 
sequence  of  execution  and  |(/)  A  i3|  is  the  number  of  times  actor  B 
executes  in  the  sequence  <f>. 

Using  the  above  SDEP  function,  we  could  formally  specify  when 
TMs  are  processed  as  follows: 

Definition  2.  Suppose  that  actor  A  sends  a  TM  to  actor  B  with 
latency  range  [fci  :  ^2]  during  the  execution  of  A.  Then,  we 
consider  two  cases: 


•  If  B  is  downstream  of  A,  then  the  message  handler  must  be 
invoked  in  B  immediately  before  its  execution,  where  m  is 
constrained  as  follows: 

n  +  fci  <  SDEPA<-s(m)  <  n  +  k2  (1) 

•  If  B  is  upstream  of  A,  then  the  message  handler  must  be  in¬ 
voked  in  B  immediately  after  its  execution,  where  m  is 
constrained  as  follows: 

SDEPs^A(n  +  fci)  <  m  <  SDEPs^A(n  +  fc2)  (2) 

To  illustrate  the  usage  SDEP  to  find  appropriate  executions  of 
receiving  actors,  we  take  the  FIR  example  in  Figure  [T(^  and  try 
to  find  the  execution  m  of  a  Multiply  that  will  need  to  process  a 
TM.  Suppose  that  at  the  5*^  execution  (n  =  5),  the  Source  actor 
sends  a  TM  to  a  Multiply  actor  with  latency  fci  =  ^2  =  2.  Then 
we  have: 

TL  k\  ^SDEPsource^— Multiply  (m)  ^  TL  /l2 

5  -p  2  ^SDEPsourcei— Multiply(m)  ^  5  -p  2 

T  ^SDEPsourcei— Multiply  (m)  ^  T 

SDEPsourcei— Multiply  (m)  —  T 

Each  time  the  Source  actor  fires,  it  produces  one  token  and 
each  time  one  Multiply  actor  fires,  it  produces  one  token  and 
consumes  one  token,  therefore,  in  order  for  a  Multiply  actor  fires 
m  times,  the  Source  actor  has  to  fires  m  times,  in  other  words, 
SDEPsource<-Muitipiy(m-)  =  m.  Heuce,  m  =  7. 

2.4  SDEP  Calculation 

Streamit  computes  the  SDEP  function  using  the  simple  pull  sched¬ 
ule  1301.  For  brevity,  we  refer  the  readers  to  1301  for  further  details 
on  the  algorithm  and  details.  However,  to  intuitively  illustrate  the 
SDEP  calculation,  we  take  a  simple  example  of  actors  B  and  D  in 
Figure]^  The  SDEP s<_d  (m)  is  as  in  Table[^  In  the  example,  when 
D  does  not  execute,  it  does  not  require  and  number  of  executions  of 
B.  In  order  for  D  to  execute  the  first  time,  it  requires  three  tokens 
on  its  input  channel.  Based  on  this  requirement,  the  pull  schedule 
algorithm  will  try  to  pull  three  tokens  from  the  supplier,  actor  B.  To 
supply  the  three  tokens,  B  has  to  execute  at  least  two  times,  there¬ 
fore  we  have  SDEPb<-d(1)  =  2.  Similar,  when  D  wants  to  execute 
one  more  time,  it  needs  two  more  tokens  so  it  will  try  to  pull  the  two 
tokens  from  B.  Again,  B  has  to  execute  one  more  time  to  supply 
the  two  tokens  and  we  have  SDEPb<-d(2)  =  3. 


m 

SDEPs.f D(7?T') 

0 

0 

1 

2 

2 

3 

Table  1.  Dependency  Function  Example 

Periodicity  of  SDEP:  As  CSDF  is  periodic,  therefore  SDEP  is  also 
periodic.  This  means  that  one  does  not  need  to  compute  SDEP  for  all 
executions,  instead,  one  could  compute  SDEP  for  some  executions 
then  based  on  the  periodic  property  of  SDEP  to  query  future  depen¬ 
dency  information.  The  following  equation  was  adapted  from  1301: 

SDEPA^B(n)  =  i  *  |5  A  A|  -P  SDEPA^B(n  -  i  *  \S  A  B\)  (3) 

where  5  is  the  execution  of  actors  within  one  iteration  of  a 
stream  graph  and  |5  A  j4|  is  the  number  of  executions  of  actor  A 
within  one  iteration  of  its  stream  graph,  i  is  some  iteration  such  that 
0  <  i  <  p{n)  wherep(n)  =  n-p|5Af3|  is  the  number  of  iterations 
that  B  has  completed  by  its  execution  n|^ 

^  We  define  a  ^  b  =  [  ^  J 
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3.  Execution  Dependencies 

The  Streamit  compiler  relies  on  the  SDEP  dependency  function  to 
determine  when  a  TM  handler  could  be  invoked  based  on  mes¬ 
sage  latencies.  TMs  actually  impose  constraints  on  execution  or¬ 
ders  of  actors  on  the  path  between  senders  and  receivers  as  actors 
cannot  execute  arbitrarily  whenever  data  are  available  at  their  in¬ 
puts,  rather,  TM  receiving  actors  have  to  wait  for  possible  TMs  so 
that  they  cannot  execute  too  far  ahead.  Constraints  on  executions 
of  receiving  actors  will  again  constrain  executions  of  intermediate 
actors  (actors  between  TMG  senders  and  receivers).  When  inter¬ 
mediate  actors  on  the  path  between  a  sender  and  a  receiver  are  not 
involved  in  TMG  communication,  in  other  words,  there  are  no  TMs 
overlapped  as  on  the  right  side  of  Figure]^  only  a  single  constraint 
is  imposed  on  execution  orders  and  a  valid  message  latency  could 
be  easily  checked  using  the  following  condition:  an  upstream  mes¬ 
sage  cannot  have  a  negative  latency. 

When  some  intermediate  actors  on  the  path  between  a  sender 
and  a  receiver  are  also  involved  in  some  other  TMG  communication 
as  on  the  left  side  of  Figure  this  scenario  is  called  overlapping 
of  TMs  or  overlapping  constraints  by  Thies  et  al.  1301.  Because 
additional  constraints  impose  on  execution  orders  of  intermediate 
actors,  added  constraints  might  make  it  impossible  to  schedule 
executions  of  actors  as  in  the  example  in  Section  o  because  of 
circular  dependencies. 

Checking  for  this  general  scheduling  problem  with  overlapping 
of  TMs  is  not  straightforward  as  the  SDEP  stream  dependence 
functions  are  not  linear,  e.g.  as  in  Table  [T]  due  to  mis-matching 
input/output  rates  of  actors. 

To  solve  the  above  circular  dependencies  checking  problem  in 
the  presence  of  overlapping  of  TMs,  we  could  exploit  the  graph 
unfolding  technique  in  to  construct  a  directed  graph  that  helps 
reason  about  dependencies  between  executions  of  actors  in  a  stream 
graph. 

However,  to  construct  such  a  directed  execution  dependency 
graph,  we  first  need  to  characterize  and  classify  all  kinds  of  exe¬ 
cution  dependencies. 

3.1  Actor  Execution  Dependencies 

Definition  3.  Execution  dependency:  An  execution  ei  of  an  ac¬ 
tor  is  said  to  be  dependent  on  another  execution  €2  of  some  actor 
(could  be  the  same  actor)  when  ei  has  to  wait  until  62  has  finished 
before  it  can  commit  its  result. 

From  our  insight,  we  have  three  kinds  of  execution  dependency  as 
follows: 

•  Causality  dependency:  The  (n  +  1)*^  execution  of  A  has  to  be 
executed  after  the  execution  of  A,  or  An  -<  An+\ 

•  Data  dependency:  For  two  directly  connected  actors  A  and  B 
and  actor  B  is  a  downstream  actor  of  actor  A,  then  an 
execution  of  B,  let’s  call  Bn,  will  be  data-dependent  on  a 
execution  of  A,  called  Am.,  if  Bn  consumes  one  or  more  tokens 
produced  by  Am  ■ 

•  Control  dependency  due  to  TMs  sent  between  actors.  As  an 
actor  S  at  iteration  sends  a  TM  to  an  actor  B  with  latency 
[^1,^2],  then  for  all  the  executions  of  R  satisfying  the 
Definition]^  Rm  is  said  to  be  control  dependent  on  Sn  as  it 
might  consumes  some  control  information  from  Sn- 

3.2  Directed  Execution  Dependency  Graph 

To  check  for  circular  dependencies  we  would  need  to  construct  a 
dependency  graph  and  check  for  circles  in  that  graph.  If  there  is 
no  circle  in  a  dependency  graph,  then  the  graph  is  just  a  directed 
acyclic  graph,  and  an  evaluation  order  could  be  found  using  topo¬ 
logical  sort.  Our  dependency  classification  in  the  previous  section 


enables  us  to  construct  such  a  directed  execution  dependency  graph 
of  actor  executions. 

The  construction  of  such  a  directed  dependency  graph  is  done 
in  two  steps.  First,  we  simply  replicate  executions  of  actors  in  its 
original  CSDF  model  of  computation.  Second,  we  add  causality 
dependency,  data  dependency  and  control  dependency  edges  to  the 
graph. 

The  first  step  could  be  done  by  expanding  iterations  of  a  CSDF 
stream  a  graph.  For  example,  we  have  a  stream  graph  as  in  Figure]^ 
Within  one  iteration  of  the  whole  stream  graph,  according  to  CSDF 
model  of  computation,  actors  A,  B,  C  and  D  execute  6,  3,  2  and  4 
times  respectively.  Each  execution  of  an  actor  is  replicated  as  one 
vertex  in  execution  dependency  graph  as  in  Figure]^  Note  that  we 
use  A\  to  denote  the  2"'‘*  relative  execution  of  actor  A  within  the 
iteration  of  the  stream  graph,  this  is  equivalent  to  the  (i*6-|-2)*^ 
absolute  (from  the  beginning  when  the  program  starts)  execution  of 
actor  A  as  A  executes  6  times  in  one  iteration  for  the  stream  graph 
in  Figure]^ 

In  the  second  step,  although  dependency  edge  calculation  is 
based  on  our  dependency  classification,  however,  detailed  methods 
for  calculating  the  dependency  edges  have  not  been  presented.  In 
the  following  section,  we  will  show  how  to  calculate  execution 
dependencies  for  the  second  step. 

3.2.1  Calculating  Execution  Dependencies 

Although  we  enumerated  three  kinds  of  execution  dependencies 
in  Section  ED  so  far  we  have  not  shown  how  to  compute  those 
dependencies.  Computing  causality  dependency  is  straightforward. 

For  data  dependency  edges,  we  utilize  the  SDEP  function  im¬ 
plemented  using  the  pull  scheduling  algorithm  in  1301.  For  any 
two  connected  actors  A,  upstream,  and  B  downstream,  we  cre¬ 
ate  a  dependency  edge  between  the  execution  SDEP,4<_s(n)‘^  of 
A  to  the  execution  of  B,  we  denote  Asoepa^bI")  ^ 

Note  that  Am  A  A^+i  based  on  causality  dependency  condi¬ 
tion,  therefore.  Bn  >~  Am,'irn  =  1  — >■  SDEPA<-s(n).  Thus, 
we  do  not  need  to  add  any  dependency  edges  between  Bn  and 
Amy'^rn  <  SDEPA<-B(n),  as  those  dependencies  are  implicit  and 
could  be  inferred  from  Am  A  A,„+i  and  Asdep^^bC")  ^ 

Finally,  for  control  dependency  edges  due  to  TMs,  those  depen¬ 
dencies  can  already  be  computed  using  SDEP  as  proposed  in  1301. 
If  an  actor  S  sends  a  TMs  to  an  actor  R  with  latencies  [fci,  ^2]. 
Applying  the  Definition]^  we  have  two  cases: 

•  If  i?  is  downstream  of  S,  then  create  an  edge  Rm  — >■  Sn 
s.t.  n  +  ki  <  SDEPs<_i{(m)  <  n  -\-  k2.  However,  be¬ 
cause  of  the  causality  dependency  and  SDEP  is  monotonic, 
e.g.  F?m|SDEPB,_B(’7i)  — Ti-l-fcl  —  ^m\S0EP sy— R(rn)=n-\-k2  ’  and 
we  would  like  to  have  as  few  dependency  edges  as  possi¬ 
ble,  therefore  we  only  need  to  add  one  dependency  edge 

l^m|SDEPB,_B(m')— Ti  +  fei  ^  Sn 

•  If  i?  is  upstream  of  S,  then  create  an  edge  Rm  — >■  Sn  such 
that  SDEPi{<_5(n  -F  fci)  <  m  <  SDEP_R<_s(n  -F  ^2).  For 
the  same  reasons  as  in  the  previous  case,  we  only  add  one 
edge  7?sdepb^s("+'=i)+i 

SDEPij<_s(n  -F  fci)  because  the  message  handler  at  R  is  in¬ 
voked  after  the  execution  SDEP_R<_s(r!,  -F  fci)  of  R. 


^  To  make  this  conversion  clear,  we  take  an  example.  Suppose  that  an  actor 
A  has  executed  n  times  since  a  program  starts  and  in  each  iteration  of  the 
program,  A  executes  a  times.  Then  we  can  calculate  that  the  execution  An 
belongs  to  i  =  n  a  iteration  of  the  whole  program  (based  on  CSDF 
semantics)  and  it  is  the  r  =  (n  mod  i)  execution  of  A  within  the 
iteration  of  the  program.  In  other  words  An  We  call  n  the 

absolute  execution,  r  the  relative  execution,  and  i  the  iteration  index.  We 
will  use  this  conversion  frequently  in  next  sections. 
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3.2.2  Illustrating  Example 

Let  us  come  back  to  the  example  in  Figure  however,  now  we 
suppose  that  E  sends  TMs  to  A  with  latency  0  and  B  sends  TMs 
to  D  with  latency  -1,  then  the  execution  dependency  graph  is  as  in 
Figure]^  where  the  causality  dependency  edges  are  exhibited  using 
arrows  with  dashed  lines,  data  dependency  edges  are  in  dash-dot 
arrows,  and  control  dependency  edges  are  in  solid  arrows. 

We  use  the  function  SDEP  between  actors  B  and  D  in  Table  [T] 
to  illustrate  our  method.  For  actors  B  and  D,  the  dependency  SDEP 
function  is  given  in  Table  [T]  then  for  any  iteration  n  of  the  stream 
graph,  we  add  data  dependency  edges  — >■  and  DJ  — >■  B^ 

as  in  Figure  [^because  within  one  iteration,  the  first  execution  of 
D  is  data  dependent  on  the  second  execution  of  B  and  the  second 
execution  of  D  is  data  dependent  on  the  third  execution  of  B. 

For  control  dependency  edges,  as  TMs  from  actor  B  to  D  have 
delays  of  fci  —  1,  and  SDEPB<__D(n  *  e{D)  -|-  1)  =  (n  *  e{B)  + 
3)  —  ki,yn  €  N,  where  e(X)  is  the  number  of  times  actor  X 
executes  within  one  iteration  of  a  stream  graph,  therefore,  we  add 
an  edge  D"  — >■  B^.  Similarly  ,  SDEPfl<__D(ti  *  e(Z))  -P  2)  = 
((n-|-l)*e(B)-|-l)-|-  ki,Wn  £  N,  we  add  an  edge  D"  — >■ 

Although,  Figure]^ only  shows  dependencies  for  a  portion  of  a 
whole  infinite  graph,  the  basic  dependency  structures  of  the  whole 
execution  dependency  graph  are  similar  for  all  iterations  n  €  N, 
except  for  some  initial  iterations,  as  CSDF  model  of  computation 
is  periodic. 

We  can  see  that,  the  above  naive  graph  construction  process  will 
result  in  an  infinite  directed  graph  as  the  execution  sequence  of 
a  streaming  application  based  on  CSDF  model  of  computation  is 
presumed  to  be  infinite.  Because  an  infinite  directed  graph  could 
not  be  used  directly  to  check  for  cyclic  dependency  of  actor  execu¬ 
tions,  we  need  a  way  to  translate  them  into  a  similar directed 
graph  that  captures  all  dependency  structures  in  the  original  infinite 
graph. 

4.  Checking  for  Circular  Dependencies 

4.1  Dynamic/Periodic  Graph 

The  technique  for  translating  such  infinite  execution  dependency 
graphs  into  finite  static  graph  is  already  available  in  the  dy¬ 
namic/periodic  graph  theories  1131 1241  if  we  notice  that  the  in¬ 
finite  execution  dependency  graphs  are  periodic  similarly  to  the 
dynamic/periodic  graph  definition  below. 

Definition  4.  A  directed  dynamic  periodic  graph 
is  induced  by  a  static  graph  G  —  {V,  E,  T),  where  V  is  the  set  of 
vertices,  E  is  the  set  of  edges,  T  ■.  E  ^  is  a  weight  function  on 
the  edges  of  G,  via  the  following  expansion: 

=  {v^lv  £  l/,p£  Z'“} 

E°°  =  £  E,tu^  £  T,p£  Z'“} 

If  we  interpret  £  T  as  transit  time  representing  the  number 
of  fc-dimensional  periods  it  takes  to  travel  from  u  to  v  along  the 
edge,  then  the  vertex  of  could  be  interpreted  as  vertex  v  of 
G  in  a  fc-dimensional  period  p  and  edge  )  represents 

travelling  from  u  in  period  p  and  arriving  at  v  tuv  periods  later. 

Intuitively,  a  fc-dimensional  periodic  graph  is  obtained  by  repli¬ 
cating  a  basic  graph  (cell)  in  a  fc-dimensional  orthogonal  grid.  Each 
vertex  within  the  basic  graph  is  connected  with  a  finite  number  of 
other  vertices  in  other  replicated  graphs  and  the  inter-basic-graph 
connections  are  the  same  for  each  basic  graph. 

Our  observation  is  that  an  execution  dependency  graph  of  a 
CSDF  stream  graph  is  an  infinite  1 -dimensional  dynamic  graph 
with  its  basic  graph  composed  of  vertices  that  are  actor  executions 
of  the  CSDF  stream  graph  within  one  iteration.  The  basic  cell 


is  repeatedly  put  in  a  1 -dimensional  time  grid.  Data,  causality 
and  control  dependencies  form  directed  edges  between  vertices. 
Some  of  edges  created  by  causality  and  control  dependencies  could 
be  inter-cell  (inter-iteration)  connections.  As  the  CSDF  model  of 
computation  is  periodic  by  nature,  the  pattern  of  the  inter-cell 
(inter-iteration)  connections  is  the  same  for  each  cell  (iteration). 

Based  on  the  above  observation  and  the  theory  of  infinite  pe¬ 
riodic  graph,  to  check  for  cyclic  dependency  of  an  infinite  depen¬ 
dency  graph  with  TM  conshaints,  we  could  construct  an  equiva¬ 
lent  static  finite  graph.  We  then  can  prove  that  there  is  a  cyclic 
dependency  in  a  execution  dependency  graph  of  a  CSDF  program 
with  added  TM  constraints  iff  the  equivalent  static  graph  has  a  cy¬ 
cle  with  weight  equal  to  0.  In  the  next  section,  we  will  show  how 
to  translate  1 -dimensional  periodic  graph  in  Figure]^ into  a  static 
finite  graph. 

4.2  Translating  to  Static  Finite  Equivalent  Graph 

Figure  [6(^  shows  the  equivalent  static  finite  dependency  graph  of 
the  infinite  execution  dependency  graph  in  Figure]^  In  the  graph, 
all  edges  have  do  not  have  specified  weights  are  of  weight  0. 

Intuitively,  all  the  vertices  within  one  arbitrary  iteration,  say  it¬ 
eration  n,  are  kept  to  form  vertices  in  the  equivalent  static  graph, 
however,  iteration  indices  are  removed,  e.g.  Ai  becomes  Ai.  Di¬ 
rected  edges  between  vertices  within  the  one  iteration  are  also  kept 
and  their  weights  are  set  to  0.  For  directed  edges  cross  iterations, 
only  outgoing  edges  (edges  from  this  iteration  to  some  other  iter¬ 
ations)  are  used  to  translate  to  equivalent  edges  in  the  new  static 
graph.  The  translation  is  done  as  follows,  suppose  that  an  outgoing 
edge  is  >  R^,  then  we  add  a  directed  edge  Sx  — >  Ry  with 
weight  n—m.  n—m  is  called  relative  iteration,  which  is  the  gap  be¬ 
tween  iterations  of  two  actor  executions.  For  example,  the  directed 
edge  D2  — >■  B'l^^  in  Figure  [^becomes  the  edge  D2  — >■  Bi  with 
weight  —1  in  Figure[6(^  Note  that  an  edge  Sx  — >  Ry  is  equivalent 
to  any  edge  Sx  Ry  in  the  execution  dependency  graph  as  long 
as  i  —  j  =  n  —  m  because  of  the  repetitive  nature  of  the  CSDF 
model  of  computation. 

4.3  Graph  Equivalence 

We  cite  the  following  Lemma[T]from  Lemma  1  in  1241. 

Lemma  1 .  Let  G  =  {V,  E,T)  be  a  static  graph.  For  u,v  £V  and 
p,l  G  Z,  there  is  a  one-to-one  canonical  correspondence  between 
the  set  of  finite  paths  from  vF  — >■  n*  in  G°°  and  the  set  of  paths  in 
G  from  u  to  V  with  transit  time  I  —  p. 

The  above  lemma  is  useful  for  proving  the  following  theorem: 

Theorem  1.  A  dependency  circle  in  an  execution  dependency 
graph  is  equivalent  to  a  cycle  with  length  of  zero  in  the  equiva¬ 
lent  static  graph. 

Proof:  Suppose  that  there  is  a  circle  in  an  execution  dependent 
graph,  say  ^  Xf/  ^  . .  .  ^  Afj  ^  X^K  By  LemmaQ 
this  circle  is  equivalent  to  a  directed  circle  with  edges  (Xij  — >■ 
Xi^),  {Xi,  ^  Xi,),  ...,iXi^_,  ^  ^  Xi,)  of 

weights  (m  -  n2),  (n2  -  ns), . . . ,  (rim-i  -  Um),  (n™  -  ni), 
respectively.  The  sum  of  the  circle  in  the  equivalent  directed  circle 
is:  (ni  -n2)-|-(n2  -  ns)  +  . . . -f  (n,„_i  -  n™)  +  (n^  -  ni)  =  0. 

4.4  Detecting  for  Zero  Cycles 

We  have  shown  that  a  circle  of  execution  dependencies  is  equiv¬ 
alent  to  a  cycle  of  zero  weight  in  an  equivalent  graph.  However, 
we  have  not  shown  how  a  cycle  of  zero  weight  is  detected.  In  mi, 
Iwano  and  Steiglitz  propose  an  algorithm  for  detecting  zero  cycles 
in  an  l-dimensionatjstatic  graph  with  complexity  0(n®)  (Theorem 

Distances  between  vertices  have  only  one  dimension. 
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Figure  5.  The  execution  dependency  trace  of  a  CSDF 
stream  graph  with  teleport  messaging  is  an  infinite  peri¬ 
odic  directed  execution  dependency  graph 


(a)  Static  equivalent  execution  dependency  graph 


(b)  Static  execution  dependency  graph  without  zero  cycles 


Figure  6.  Static  graphs 


4  in  ini)  where  n  is  the  number  of  vertices  in  the  1 -dimensional 
static  graph. 

4.5  Illustrating  Example 

To  illustrate  the  circle  checking  method  better,  we  take  the  trans¬ 
lated  static  equivalent  graph  in  Figure  [6(^  We  run  the  cycle  de¬ 
tection  algorithm  in  03  and  detect  a  zero  cycle,  Ai ,  i?5 ,  D2 ,  B\ , 
this  zero  cycle  in  the  static  graph  is  equivalent  to  the  cycle  of  de¬ 
pendencies  in  the  execution  dependency  graph  in  Figure]^  A"  >- 

y  ^  Br  ^  A^. 

Now,  when  the  latency  of  TMs  from  B  to  ^  is  -4,  the  equivalent 
static  graph  is  shown  on  Figure[6(b^  The  graph  has  no  zero  cycles, 
thus,  there  is  no  circle  of  dependencies  in  the  execution  dependency 
graph,  therefore,  the  set  of  constraints  imposed  by  teleport  commu¬ 
nication  is  feasible. 

5.  Direct  Construction  of  Static  Graphs 

In  the  previous  section,  we  assumed  that  we  already  have  con¬ 
structed  an  infinite  dynamic  execution  dependency  graph  already 
and  showed  how  to  convert  it  to  a  finite  static  equivalent  graph. 
However,  it  is  not  possible  and  useful  to  construct  an  infinite  graph 
from  a  stream  program,  instead,  we  could  use  a  similar  mechanism 
to  construct  the  static  equivalent  graph  directly  from  a  stream  pro¬ 


gram  as  we  know  the  repetitive  dependency  structures  across  itera¬ 
tions  of  a  CSDF  stream  graph.  Basically,  we  would  like  to  construct 
weight  function  T  of  a  G  =  {V,  E,  T)  from  a  CDSF  stream  graph. 
The  AlgorithmfTlshows  how  to  the  direct  translation  process  works. 

Similar  to  iSnite  directed  dependency  graphs,  equivalent  static 
graphs  are  constructed  in  two  steps.  First,  executions  of  one  actors 
are  replicated  the  same  number  of  times  that  the  actor  fires  within 
one  iteration  of  a  CSDF  model,  each  execution  becomes  one  vertex 
in  the  equivalent  static  graph.  The  next  step  is  to  add  dependency 
edges  between  vertices  based  on  three  kinds  of  dependency  enu¬ 
merated  in  Section[JT  and  methods  presented  in  Section[3.2.1| 

In  the  Algorithm  11  where  get  jmnmreps  function  returns  the 
number  of  repetitions  of  an  actor  within  one  iteration  of  the  whole 
stream  graph.  The  function  compute_rel_iter_exe  computes  rel¬ 
ative  iteration  v  and  relative  execution  e^,  we  elaborate  more  on 
the  meanings  of  those  return  values. 
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Algorithm  1  Algorithm  for  Constructing  Static  Equivalent  Graphs 

(0,0,0) 

sched  <r-  compute_CSDF_schedule(sfreamGrap/i) 

{Create  the  set  of  vertices} 

for  all  actor  do 

for  ea;e  =  1  — >  sc/ied.get_num_reps(octor)  do 
{Each  vertex  is  one  actor  execution  in  one  iteration} 

V  ^  1/+ new_vertex(acfor,  exe) 

end  for 
end  for 

{Add  data  dependency  edges} 

for  all  w  G  y  do 

for  all  actor  G  upstream_actors(w. actor)  do 
absolute-cxe  SUEPactor-i-v.actoriv.exe) 

{Translate  from  absolute  execution  to  relative  one} 
iteration  {absolutc-exe  —  1)-G 

sc/ied. get  jnuiii_reps  (actor) 
exe  •<—  1  +  [absolute.exe  —  1)  mod 

sc/ied. get  jnuiii_reps  (actor) 
u  •<— get_vertex(actor,  exe) 
e  •<— new_edge(t6,  w) 

+  e 

T  ■(—  T  +  (weigth(e)  {—iteration)) 

end  for 
end  for 

{Add  causality  dependency  edges} 

for  all  r  G  y  do 
if  v.exe  >  1  then 

{Edges  within  one  CSDF  iteration  have  weight  0} 
u  •<—  get_vertex(r. actor,  r. exe  +  1) 
e  new_edge(r,  u) 

E  E  +  e 

T  •<—  T  +  (weigth(e)  0) 

else 

{Edges  to  previous  CSDE  iterations  have  weight  1} 
u  G-  get_vertex(r. actor, 

sc/ied.get_nuiii_execut  ions  (actor ) ) 
e  ■<—  new_edge(r,  u) 

+  e 

T  T  +  (weigth(e)  1) 

end  if 
end  for 

{Add  control  dependency  edges} 

for  all  actor  do 

if  send_teleport_msg(actor)  then 

for  exe  =  1  — >■  sc/ied.getjnuin_reps(actor)  do 

for  all  recr  G  get_teleport_receivers(actor)  do 
{Get  minimal  teleport  message  latency} 
ki  G-  get_min_latency(actor,  recv) 

{Determine  relative  iterations  and} 

{relative  executions  of  teleport  receivers} 

(tr,  Cr)  i 

compute_rel_iter_exe(actor,  recv,  exe,  ki) 
s  ■(—  get_vertex(actor,  exe) 
r  get_vertex(recr,  Cr) 
e  new_edge(s,  r) 

E  E  +  e 

{Relative  iteration  of  each  receiver} 

{is  the  edge  weight} 

T  T  +  (weigth(e)  ^  ir) 

end  for 
end  for 
end  if 
end  for 


Suppose  that  a  teleport  sender  S  at  its  absolute  execution  is 
sending  a  message  to  a  receiver  R  with  minimum  latency  fci.  Then 
the  absolute  execution  of  the  receiver  R,  will  be  computed  as 
in  Section[3.2.  II  We  then  use  the  conversion  from  absolute  execu¬ 
tions  of  actors  to  relative  executions  and  iterations  as  in  Section[T2] 
Suppose  that  S  and  R  execute  s  =  |5  A  S'!  and  r  =  |5  A  -R|  times 
within  one  iteration  of  a  stream  graph  respectively,  then  the  relative 
executions  and  iterations  of  S  and  R  for  their  exections  and 
is  computed  as  follows: 

r  =  n  mod  s,  z  =  n  s 

r^  =  m  mod  r,  i^  =  m^r 

then  ir  =  i^  —  i^  and  er  =  r^. 


5.1  Uniqueness  of  Contructed  Static  Graph 

Lemma  2.  V  is  the  same  for  all  i®. 

Proof:  Suppose  that  we  consider  the  same  relative  execution  of  S 
in  j  iterations  later  of  the  stream  graph,  say  iteration  +  j,  then 
the  absolute  execution  of  S'  is  +  (i®  -\-j)*s  =  n-\-s*j.  We 
have  two  cases: 


•  If  i?  is  downstream  of  S.  Note  that  CSDF  is  periodic,  therefore, 

if  SDEPs<_j{(m)  —  n  +  k\  then  SDEPs<__R(m  +  r  *  j)  = 
n  +  s  *  j  +  ki  based  on  Equation!^  Thus  ir  =  {{n  +  s*  j)  ^ 
s)  —  {{m  +  r  *  j)  ^  r)  =  {i^  +E  ~  +  j)  =  —  i^- 

•  If  i?  is  upstream  of  S  then  m  =  SDEPi{<_s(n+fci).  As  CSDFis 
periodic,  therefore,  m  +  r*ji  =  SDEP i{<_s'(n  +  s*j -l-fci)  based 
on  Equationpl  Thus,  ir  =  {{n  +  s*j)-{rs)  —  {{m  +  r*j)^r)  = 
{i^  +j)-^+j)  =  i^  -i^. 


Lemma  1^  shows  us  a  method  to  compute  the  relative  iteration 
ir.  As  ir  is  the  same  for  all  i^ ,  therefore,  we  could  take  an  arbitrary 
i'®  that  is  large  enough  just  that  we  could  find  the  execution 
of  R  from  the  execution  of  S  where  n  =  i®  *  s  -|-  r®  as  in 
Section  3.2.1  Based  on  founded  m,  we  could  can  calculate  i^  and 
as  shown  above,  subsequently,  we  could  find  ir  and  er. 


Theorem  2.  The  constructed  static  graph  from  a  CSDF  stream 
graph  is  unique. 

Proof:  As  all  steps  in  Algorithm  [T]  is  deterministic,  therefore  the 
result  static  graph  is  unique  for  one  CSDF  stream  graph. 


6.  Finding  Execution  Schedule 

We  have  shown  our  technique  for  checking  for  circular  dependen¬ 
cies  given  a  CSDF  stream  graph  without  showing  how  to  find  a 
schedule  of  actor  executions  under  the  constraints  imposed  by  la¬ 
tencies  of  teleport  communications. 

6.1  Auto-discovered  Schedule 

However,  we  realize  that  even  in  the  presence  of  TM  overlappings, 
the  current  scheduling  method  in  Streamit  will  still  work  with¬ 
out  being  changed.  Basically,  the  circular  dependency  checking 
shows  the  existence  of  at  least  one  execution  order  that  satisfies  la¬ 
tency  constraints  of  TMs  implemented  in  Streamit  based  on  a  pull 
scheduling  algorithm. 

6.2  Finding  Schedule  with  Topological  Sort  of  Actor 
Executions 

Although  a  schedule  could  be  discovered  automatically  when  ac¬ 
tors  run,  it  is  beneficial  to  find  precomputed  schedules  as  it  might 
help  optimize  program  performance. 

One  might  thought  that  we  already  have  execution  dependency 
graphs,  we  can  run  traditional  topological  sort  algorithms  on  the 
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graphs  to  obtain  a  schedule.  However,  the  execution  dependency 
graphs  are  infinite,  therefore,  a  naive  sorting  approach  will  not 
work  as  it  takes  forever  to  finish.  Instead,  we  need  a  systematic 
numbering  method  that  assigns  values  to  vertices  based  on  their 
indices  and  structures  of  static  graph. 

We  employ  the  topological  sort  method  for  acyclic  dynamic  pe¬ 
riodic  graphs  described  in  Section  3.8  in  (m  .  For  an  1 -dimensional 
acyclic  periodic  graph  G°“,  we  could  calculate  a  value  A(v^)  for 
each  vertex  in  the  periodic  graph  such  that  if  there  is  a  vertex 

-<  u*  in  the  periodic  graph,  then  A(u*)  <  A{v^).  The  calcu¬ 
lation  is  done  first  by  constructing  a  static  graph  G  =  {V,E,T) 
from  G°°  and  then  solving  the  following  linear  program: 

min  auv 

{u,v)GE 

TTv  —Tt'u+  'yTuv  >  0  V(u,  v)  €  E 
—  T^u  +  'yTuv  +  CTuv  >  1  V(u,  V)  €  E 
(Juv  >  0  V(u,  v)  £  E 

where  r„„  is  the  weight  of  edge  (u,  v)  €  E. 

The  above  linear  program  has  a  unique  optimal  solution  for  an 
acylic  periodic  graph  El,  let’s  call  it  (ct*,  tt*,  7*).  Then  the  value 
assignment  procedure  for  each  vertex  is  follows: 

A{v^)  =  TT*  —  7*p  yv  £  V  (4) 

7.  Experiment 

We  have  implemented  the  algorithm  as  a  Streamit  backend  that  is 
capable  of  checking  for  circular  dependencies  and  topologically 
sorting  actor  executions  based  on  the  algorithms  we  presented  in 
this  paper. 

Our  circular  dependency  checking  algorithm  could  correctly 
detect  invalid  sets  of  TMs  as  in  the  examples  we  presented  with 
the  same  static  dependency  graphs  generated. 

For  the  example  without  any  circular  dependencies  when  la¬ 
tency  for  TMs  from  FI  to  A  is  -4  and  for  TMs  from  i?  to  D  is  - 1  as 
in  Figure  |6(b)[  the  topological  sorting  algorithm  using  linear  pro¬ 
gramming  find  7*  =  8  and  =  0,7r|;^  =  =  2,7rJ,2  = 

3,7r|;2  =  5,7r|;^  =  6,7r|,^  =  =  8,  = 

ll,7ri3  =  =  9,^^^  =  12,  =  12,7ri3  =  13.  Based 

on  those  obtained  values,  we  could  find  an  execution  order  of  ac¬ 
tors  for  the  graph  that  does  not  violate  dependency  constraints  as 
follows:  A  (A^.Br)  A  {Dr\C'l)  <  {Er^)  ^ 

A  {Er\c^,B^)  -<  pr)  a  (sr). 

8.  Related  Work 

Thies  et  al.  ll30l  introduce  TM  as  a  mechanism  to  relax  the  rigid¬ 
ness  of  CSDF.  They  present  an  analysis  that  computes  processing 
time  of  TMs.  However,  this  analysis  is  applicable  to  only  non¬ 
overlapping  TMs.  Consequently,  CSDF  graphs  with  overlapping 
TMs  cannot  utilize  this  analysis.  We  address  this  limitation  with  our 
dependency  analysis  method  applicable  to  any  CSDF  graph  with 
TMs.  Furthermore,  we  show  that  it  is  possible  to  compute  sched¬ 
ules  for  the  CSDF  graphs  with  arbitrary  TM  structures,  which  is 
not  well-defined  in  the  work  by  Thies  et  al.  Ool. 

Our  method  is  closely  related  to  the  method  by  Rao  and 
Kailath  |26I  for  scheduling  digital  signal  processing  algorithms  on 
arrays  of  processors,  where  they  use  reduced  dependence  graphs 
similar  to  the  periodic  graphs,  as  a  proxy  for  scheduling  and  map¬ 
ping  computation  operators  on  arrays  of  processors.  However,  the 
work  mainly  focuses  on  a  class  of  applications  equivalent  to  homo¬ 
geneous  SDF  applications  Our  work  is  for  a  more  general  class 

^  Actors  in  homogeneous  SDF  applications  only  produce/consume  one  data 
token  at  each  output/input  ports  whenever  they  execute. 


of  signal  processing  applications  with  sporadic  control  messages 
that  require  synchronizations  between  processes. 

Zhou  and  Lee  ED  tackle  circular  dependency  analysis  using 
causality  interfaces.  The  primary  focus  of  their  work  is  on  dataflow 
models.  For  SDF  case,  they  do  not  account  for  control  messages  in 
their  SDF  models.  Moreover,  we  also  support  dependency  analysis 
for  CSDF  with  overlapping  TMs. 

Horwitz  et  al.  03  propose  a  method  for  interprocedure  pro¬ 
gram  slicing  by  constructing  dependency  graphs  between  program 
statements  with  data  and  control  dependency  edges.  The  graph  con¬ 
struction  method  is  similar  to  our  method  in  that  they  construct 
dependency  graphs  of  statements  and  use  the  graphs  to  find  de¬ 
pendency  between  interprocedure  statements.  However,  our  work 
exploits  the  CSDF  domain  specific  program  abstraction  semantics 
to  analyse  a  specific  criterion,  meanwhile,  interprocedure  program 
slicing  is  more  a  general  approach  which  can  be  used  in  other  test, 
debugging  or  verifying  processes. 

The  deadlock  analysis  method  for  communicating  processes  by 
Brook  and  Roscoe  (4)  characterizes  the  properties  and  structures 
of  networks  of  communicating  processes  that  can  cause  deadlocks, 
for  example,  which  kind  of  communication  could  cause  the  dining 
philosophers  problem.  However,  the  method  only  works  for  some 
network  of  processes  structures. 

There  are  several  works  on  deadlock  detection  in  distributed 
systems  However,  the  algorithms  proposed  in  those  works 

mainly  focus  on  detecting  deadlocks  when  they  happens  rather  than 
on  deadlock  avoidance  and  static  deadlock  analysis. 

Wang  et  al.  propose  a  method  to  avoid  potential  deadlocks  in 
multithreaded  programs  when  sharing  resources.  The  method  trans¬ 
lates  control  flow  graphs  into  Petri  nets  and  uses  Petri  net  theories 
to  find  potential  deadlocks.  Control  logic  code  is  then  synthesized 
to  avoid  potential  deadlocks.  This  bears  some  similarity  to  our  con¬ 
struction  of  dependency  graphs  to  detect  for  deadlocks.  If  there  is 
no  deadlock,  topological  sorting  of  actors  is  done  to  find  a  suitable 
order  of  executions. 

Finally,  as  an  extension  of  TMG  in  the  Streamit  compiler,  our 
work  is  based  on  SDF/CSDF  (3]  [TU  semantics.  We  also  adopt 
several  results  from  dynamic/periodic  infinite  graphs  1131 1161 1171 
I24I  and  apply  them  to  the  TM  circular  checking  and  actor  execution 
sorting  problems. 


9.  Conclusion 

In  this  paper,  we  have  introduced  a  method  of  checking  for  invalid 
sets  of  specifications  in  the  Streamit  language  by  exploiting  its 
periodic  nature.  In  fact,  the  method  is  applicable  to  other  scheduling 
problems  that  have  periodic  dependencies. 

We  have  implemented  the  method  as  a  backend  of  the  Streamit 
compiler,  however,  we  have  not  finished  code  generation  phase  for 
Streamit  applications  by  the  time  the  paper  is  written.  Our  future 
work  would  be  implementing  code  generation  and  evaluate  the  ef¬ 
fectiveness  of  TMG  as  well  as  doing  research  on  applications  hav¬ 
ing  overlapping  of  TMs.  Furthermore,  in  the  Streamit  compiler’s 
backend,  actors  involved  in  TMG  are  not  clustered  because  fused 
actors  could  not  be  identified  and  fusing  actors  could  cause  false 
dependencies,  therefore,  generated  code  could  be  not  efficient.  A 
modular  code  generation  method  such  as  the  one  proposed  by 
Lublinerman  et  al.  in  1201  could  avoid  the  problem  of  false  depen¬ 
dencies  caused  by  fusing  actors. 

As  TMG’s  semantics  are  targeted  the  CSDF  model  of  compu¬ 
tation,  it  would  be  interesting  to  extend  the  work  to  the  MDSDF 
model  of  computation  for  image  processing  applications  as  in  l28l 
as  MDSDF  is  a  more  natural  way  to  express  and  compute  image 
and  video  processing  applications. 
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